In [1]:
# Initialize Notebook
%run ../library/v1.1.7/init.ipy
HTML('''<script> code_show=true;  function code_toggle() {  if (code_show){  $('div.input').hide();  } else {  $('div.input').show();  }  code_show = !code_show }  $( document ).ready(code_toggle); </script> <form action="javascript:code_toggle()"><input type="submit" value="Toggle Code"></form>''')
Out[1]:

Acute Coronary Syndrome Drug Discovery Analysis¶


Introduction¶

This notebook contains an analyis of a user-submitted RNA-seq dataset. created using BioJupies. For more information on BioJupies, please visit http://biojupies.cloud. If the notebook is not correctly displayed on your browser, please visit our Notebook Troubleshooting Guide.

Table of Contents¶

The notebook is divided into the following sections:

  1. Load Dataset - Loads and previews the input dataset in the notebook environment.
  2. PCA - Linear dimensionality reduction technique to visualize similarity between samples
  3. Clustergrammer - Interactive hierarchical clustering heatmap visualization
  4. Library Size Analysis - Analysis of readcount distribution for the samples within the dataset
  5. Differential Expression Table - Differential expression analysis between two groups of samples
  6. Volcano Plot - Plot the logFC and logP values resulting from a differential expression analysis
  7. MA Plot - Plot the logFC and average expression values resulting from a differential expression analysis
  8. Enrichr Links - Links to enrichment analysis results of the differentially expressed genes via Enrichr
  9. Gene Ontology Enrichment Analysis - Identifies Gene Ontology terms which are enriched in the differentially expressed genes
  10. Pathway Enrichment Analysis - Identifies biological pathways which are enriched in the differentially expressed genes
  11. Transcription Factor Enrichment Analysis - Identifies transcription factors whose targets are enriched in the differentially expressed genes
  12. Kinase Enrichment Analysis - Identifies protein kinases whose substrates are enriched in the differentially expressed genes
  13. L1000CDS2 Query - Identifies small molecules which mimic or reverse a given differential gene expression signature
  14. L1000FWD Query - Projects signatures on a 2-dimensional visualization of the L1000 signature database

Results¶

1. Load Dataset¶

Here we upload a user-submitted dataset.

In [2]:
# Load dataset
dataset = load_dataset(source='upload', uid='ET9cTREfrNC')

# Preview expression data
preview_data(dataset)
Ctrl1 Ctrl2 Ctrl3 ACS1 ACS2 ACS3 SA1 SA2 SA3
DPM1 7.255649 7.375353 3.367295 12.250352 7.004579 9.096304 5.764230 7.838643 8.848128
SCYL3 1.567150 1.418937 1.364962 2.035762 1.611222 1.720633 2.029081 1.099374 1.578072
C1orf112 1.240987 0.890164 0.971860 1.229537 0.787981 0.724214 0.964807 0.747368 1.193794
FGR 67.533774 49.635314 52.748331 82.210413 36.501235 74.861904 65.817821 56.638353 41.740712
FUCA2 1.857609 2.031012 0.711422 2.347128 3.907391 1.996383 2.331961 1.555359 1.147055

Table 1 | RNA-seq expression data. The table displays the first 5 rows of the quantified RNA-seq expression dataset. Rows represent genes, columns represent samples, and values show the number of mapped reads.

In [3]:
# Display metadata
display_metadata(dataset)
Condition
Sample
Ctrl1 Healthy
Ctrl2 Healthy
Ctrl3 Healthy
ACS1 ACS
ACS2 ACS
ACS3 ACS
SA1 Stable_Angina
SA2 Stable_Angina
SA3 Stable_Angina

Table 2 | Sample metadata. The table displays the metadata associated with the samples in the RNA-seq dataset. Rows represent RNA-seq samples, columns represent metadata categories.

In [4]:
# Configure signatures
dataset['signature_metadata'] = {
    'Control vs Perturbation': {
        'A': ['Ctrl1', 'Ctrl2', 'Ctrl3'],
        'B': ['ACS1', 'ACS2', 'ACS3']
    }
}

# Generate signatures
for label, groups in dataset['signature_metadata'].items():
    signatures[label] = generate_signature(group_A=groups['A'], group_B=groups['B'], method='limma', dataset=dataset)

2. PCA¶

Principal Component Analysis (PCA) is a statistical technique used to identify global patterns in high-dimensional datasets. It is commonly used to explore the similarity of biological samples in RNA-seq datasets. To achieve this, gene expression values are transformed into Principal Components (PCs), a set of linearly uncorrelated features which represent the most relevant sources of variance in the data, and subsequently visualized using a scatter plot.

In [5]:
# Run analysis
results['pca'] = analyze(dataset=dataset, tool='pca', nr_genes=500, normalization='quantile', z_score=True, plot_type='interactive')

# Display results
plot(results['pca'])

** Figure 1 | Principal Component Analysis results. ** The figure displays an interactive, three-dimensional scatter plot of the first three Principal Components (PCs) of the data. Each point represents an RNA-seq sample. Samples with similar gene expression profiles are closer in the three-dimensional space. If provided, sample groups are indicated using different colors, allowing for easier interpretation of the results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide


3. Clustergrammer¶

Clustergrammer is a web-based tool for visualizing and analyzing high-dimensional data as interactive and hierarchically clustered heatmaps. It is commonly used to explore the similarity between samples in an RNA-seq dataset. In addition to identifying clusters of samples, it also allows to identify the genes which contribute to the clustering.

In [6]:
# Run analysis
results['clustergrammer'] = analyze(dataset=dataset, tool='clustergrammer', nr_genes=500, normalization='quantile', z_score=True)

# Display results
plot(results['clustergrammer'])

** Figure 2 | Clustergrammer analysis. **The figure contains an interactive heatmap displaying gene expression for each sample in the RNA-seq dataset. Every row of the heatmap represents a gene, every column represents a sample, and every cell displays normalized gene expression values. The heatmap additionally features color bars beside each column which represent prior knowledge of each sample, such as the tissue of origin or experimental treatment.


4. Library Size Analysis¶

In order to quantify gene expression in an RNA-seq dataset, reads generated from the sequencing step are mapped to a reference genome and subsequently aggregated into numeric gene counts. Due to experimental variations and random technical noise, samples in an RNA-seq datasets often have variable amounts of the total RNA. Library size analysis calculates and displays the total number of reads mapped for each sample in the RNA-seq dataset, facilitating the identification of outlying samples and the assessment of the overall quality of the data.

In [7]:
# Run analysis
results['library_size_analysis'] = analyze(dataset=dataset, tool='library_size_analysis', plot_type='interactive')

# Display results
plot(results['library_size_analysis'])

** Figure 3 | Library Size Analysis results. **The figure contains an interactive bar chart which displays the total number of reads mapped to each RNA-seq sample in the dataset. Additional information for each sample is available by hovering over the bars. If provided, sample groups are indicated using different colors, thus allowing for easier interpretation of the results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


5. Differential Expression Table¶

Gene expression signatures are alterations in the patterns of gene expression that occur as a result of cellular perturbations such as drug treatments, gene knock-downs or diseases. They can be quantified using differential gene expression (DGE) methods, which compare gene expression between two groups of samples to identify genes whose expression is significantly altered in the perturbation. The signature table is used to interactively display the results of such analyses.

In [8]:
# Initialize results
results['signature_table'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['signature_table'][label] = analyze(signature=signature, tool='signature_table', signature_label=label)

    # Display results
    plot(results['signature_table'][label])
logFC AveExpr P-value FDR
Gene
WDR74 -7.39 5.21 0.000097 0.179365
SNHG3 -0.96 7.77 0.001291 0.995675
RHOB -0.82 6.25 0.005625 0.995675
HIST1H4C -0.74 6.43 0.007307 0.995675
TUBB2A -3.09 5.32 0.008873 0.995675
PDZK1IP1 -1.14 7.17 0.009075 0.995675
RAB7A 0.60 5.71 0.011606 0.995675
HIST1H2AD -1.38 4.30 0.011814 0.995675
HIST1H3A -0.54 4.47 0.012083 0.995675
VIM 0.62 6.80 0.012190 0.995675
ANXA5 0.59 4.32 0.012207 0.995675
TET2 0.82 4.46 0.012900 0.995675
GZMH 0.82 5.46 0.013602 0.995675
CX3CR1 0.55 5.62 0.015798 0.995675
HIST2H3D -0.61 5.18 0.017067 0.995675
CMTM2 -0.68 5.36 0.017806 0.995675
PRELID1 -0.50 5.21 0.018348 0.995675
HIST1H4A -0.76 4.38 0.018568 0.995675
TNFAIP2 -0.54 6.08 0.019652 0.995675
LILRB3 -0.52 4.89 0.019689 0.995675
SLC38A5 -1.02 5.73 0.020209 0.995675
RN7SL1 -0.71 17.53 0.020740 0.995675
FCHO2 -0.50 5.31 0.022525 0.995675
RPS3A -0.62 6.79 0.023122 0.995675
CTA-363E6.6 -0.69 7.38 0.026010 0.995675
RASSF2 -0.82 6.05 0.027415 0.995675
VCAN 0.61 4.53 0.029072 0.995675
TMEM56 -0.67 5.28 0.029711 0.995675
IL10RA 0.53 4.45 0.029950 0.995675
ACTR2 0.70 6.19 0.030598 0.995675
PIM3 -0.56 5.85 0.032119 0.995675
RPL41 -0.58 8.69 0.033414 0.995675
NSUN3 -0.45 4.73 0.034350 0.995675
IER2 -0.91 5.97 0.034686 0.995675
AL627309.1 0.62 6.68 0.035337 0.995675
FFAR2 -0.58 6.08 0.035393 0.995675
ZFP36 -0.84 7.86 0.035478 0.995675
PF4V1 0.99 6.52 0.036937 0.995675
NATD1 -0.76 5.24 0.037278 0.995675
HIST1H3F -0.61 4.49 0.037400 0.995675
GZMB 0.48 4.61 0.037598 0.995675
RNF182 -2.81 5.01 0.038297 0.995675
HIST1H4B -0.71 6.17 0.038620 0.995675
HK1 0.53 5.04 0.040146 0.995675
CKLF -0.53 5.08 0.042031 0.995675
GUK1 -0.71 6.13 0.042417 0.995675
HIST1H2AE -0.59 6.32 0.045203 0.995675
RHBDD1 -0.51 4.44 0.045225 0.995675
MFSD14B 0.53 4.63 0.046181 0.995675
SAMD9L -0.46 4.38 0.046215 0.995675
TPM3 0.39 4.56 0.046376 0.995675
SCARNA2 -0.88 9.66 0.046846 0.995675
STOM 0.50 6.38 0.047037 0.995675
B2M 0.79 7.65 0.047087 0.995675
NKG7 0.61 6.93 0.047228 0.995675
PTMS -0.41 4.74 0.050363 0.995675
DEFA1B -1.12 5.38 0.051087 0.995675
PRKCB 0.37 5.12 0.051319 0.995675
DEFA1 -1.13 5.40 0.051532 0.995675
SF3B5 -0.48 4.83 0.053363 0.995675
IGF2BP2 0.38 5.26 0.053723 0.995675
C20orf24 -0.61 5.21 0.055030 0.995675
RPPH1 -0.51 13.65 0.055855 0.995675
HIST1H4J -0.52 7.38 0.057366 0.995675
TERC -0.80 7.00 0.058360 0.995675
UTRN 0.40 4.84 0.058984 0.995675
IFITM1 -0.55 7.55 0.059893 0.995675
SMOX -0.54 4.69 0.059907 0.995675
HIST1H2AG -0.50 4.32 0.060175 0.995675
FOS -0.99 6.94 0.060585 0.995675
YWHAQ 0.46 4.62 0.061978 0.995675
HBM -0.48 7.46 0.063238 0.995675
MAN1A1 0.46 5.00 0.063314 0.995675
HNRNPUL1 0.39 5.07 0.064120 0.995675
IKZF1 0.35 5.18 0.064376 0.995675
EIF5A -0.51 5.05 0.064523 0.995675
GOLPH3 0.40 4.41 0.065036 0.995675
B3GNT8 -0.53 4.49 0.065898 0.995675
TUBA4A -0.43 4.68 0.067932 0.995675
HBQ1 -0.46 7.04 0.068039 0.995675
TGOLN2 0.33 5.25 0.069226 0.995675
PRPF6 0.41 4.32 0.072203 0.995675
RALB -0.66 4.60 0.072253 0.995675
HIST1H2BG -0.49 5.18 0.073367 0.995675
PRF1 0.53 6.53 0.073538 0.995675
SCARNA10 -0.59 11.67 0.074411 0.995675
RNF5 -0.50 4.57 0.074519 0.995675
NUSAP1 -0.56 4.58 0.075045 0.995675
GID8 0.49 4.50 0.075104 0.995675
HIST1H4E -0.50 6.46 0.075666 0.995675
TBX21 0.51 4.54 0.075714 0.995675
TBCEL -0.65 5.25 0.076566 0.995675
TUG1 0.54 4.04 0.077688 0.995675
FCER1G -0.57 7.12 0.077887 0.995675
SAMD9 -0.34 4.97 0.079422 0.995675
RPS26 -1.33 4.84 0.080532 0.995675
LYL1 -0.59 7.01 0.081213 0.995675
C9orf78 0.47 6.45 0.082994 0.995675
ARF1 0.40 6.76 0.083932 0.995675
SKAP2 -0.51 4.61 0.084574 0.995675
Download

** Table 3 | Differential Expression Table.** The figure displays a browsable table containing the gene expression signature generated from a differential gene expression analysis. Every row of the table represents a gene; the columns display the estimated measures of differential expression. Links to external resources containing additional information for each gene are also provided


6. Volcano Plot¶

Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the significance versus fold-change estimated by the differential expression analysis.

In [9]:
# Initialize results
results['volcano_plot'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['volcano_plot'][label] = analyze(signature=signature, tool='volcano_plot', signature_label=label, pvalue_threshold=0.05, logfc_threshold=1.5, plot_type='interactive')

    # Display results
    plot(results['volcano_plot'][label])

** Figure 4 | Volcano Plot. **The figure contains an interactive scatter plot which displays the log2-fold changes and statistical significance of each gene calculated by performing a differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


7. MA Plot¶

Volcano plots are a type of scatter plot commonly used to display the results of a differential gene expression analysis. They can be used to quickly identify genes whose expression is significantly altered in a perturbation, and to assess the global similarity of gene expression in two groups of biological samples. Each point in the scatter plot represents a gene; the axes display the average gene expression versus fold-change estimated by the differential expression analysis.

In [10]:
# Initialize results
results['ma_plot'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['ma_plot'][label] = analyze(signature=signature, tool='ma_plot', signature_label=label, pvalue_threshold=0.05, logfc_threshold=1, plot_type='interactive')

    # Display results
    plot(results['ma_plot'][label])

** Figure 5 | MA Plot. **The figure contains an interactive scatter plot which displays the average expression and statistical significance of each gene calculated by performing differential gene expression analysis. Every point in the plot represents a gene. Red points indicate significantly up-regulated genes, blue points indicate down-regulated genes. Additional information for each gene is available by hovering over it. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


8. Enrichr Links¶

Enrichment analysis is a statistical procedure used to identify biological terms which are over-represented in a given gene set. These include signaling pathways, molecular functions, diseases, and a wide variety of other biological terms obtained by integrating prior knowledge of gene function from multiple resources. Enrichr is a web-based application which allows to perform enrichment analysis using a large collection of gene-set libraries and various interactive approaches to display enrichment results.

In [11]:
# Initialize results
results['enrichr'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['enrichr'][label] = analyze(signature=signature, tool='enrichr', signature_label=label, geneset_size=500, sort_genes_by='t')

    # Display results
    plot(results['enrichr'][label])
Control vs Perturbation Signature:¶
  • Upregulated: https://maayanlab.cloud/Enrichr/enrich?dataset=bf21889b6c818fff84757c13afa615c7
  • Downregulated: https://maayanlab.cloud/Enrichr/enrich?dataset=9e01a5a8929acdf7b0e2572d1ba26708

** Table 4 | Enrichr links. **The table displays links to Enrichr containing the results of enrichment analyses generated by analyzing the up-regulated and down-regulated genes from a differential expression analysis. By clicking on these links, users can interactively explore and download the enrichment results from the Enrichr website


9. Gene Ontology Enrichment Analysis¶

Gene Ontology (GO) is a major bioinformatics initiative aimed at unifying the representation of gene attributes across all species. It contains a large collection of experimentally validated and predicted associations between genes and biological terms. This information can be leveraged by Enrichr to identify the biological processes, molecular functions and cellular components which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

In [12]:
# Initialize results
results['go_enrichment'] = {}

# Loop through results
for label, enrichr_results in results['enrichr'].items():

    # Run analysis
    results['go_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='go_enrichment', signature_label=label, plot_type='interactive', go_version=2025, sort_results_by='pvalue')

    # Display results
    plot(results['go_enrichment'][label])
Download

** Figure 6 | Gene Ontology Enrichment Analysis Results. **The figure contains interactive bar charts displaying the results of the Gene Ontology enrichment analysis generated using Enrichr. The x axis indicates the -log10(P-value) for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


10. Pathway Enrichment Analysis¶

Biological pathways are sequences of interactions between biochemical compounds which play a key role in determining cellular behavior. Databases such as KEGG, Reactome and WikiPathways contain a large number of associations between such pathways and genes. This information can be leveraged by Enrichr to identify the biological pathways which are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

In [13]:
# Initialize results
results['pathway_enrichment'] = {}

# Loop through results
for label, enrichr_results in results['enrichr'].items():

    # Run analysis
    results['pathway_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='pathway_enrichment', signature_label=label, plot_type='interactive', sort_results_by='pvalue')

    # Display results
    plot(results['pathway_enrichment'][label])
Download

** Figure 7 | Pathway Enrichment Analysis Results.** The figure contains interactive bar charts displaying the results of the pathway enrichment analysis generated using Enrichr. The x axis indicates the -log10(P-value) for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


11. Transcription Factor Enrichment Analysis¶

Transcription Factors (TFs) are proteins involved in the transcriptional regulation of gene expression. Databases such as ChEA and ENCODE contain a large number of associations between TFs and their transcriptional targets. This information can be leveraged by Enrichr to identify the transcription factors whose targets are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

In [14]:
# Initialize results
results['tf_enrichment'] = {}

# Loop through results
for label, enrichr_results in results['enrichr'].items():

    # Run analysis
    results['tf_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='tf_enrichment', signature_label=label)

    # Display results
    plot(results['tf_enrichment'][label])

A. ChEA (experimentally validated targets)¶

Rank Transcription Factor P-value FDR Target
1 AF4 26711339 ChIP-Seq SEM Human Blood Leukemia* 5.476736e-44 4.107552e-41 200 upregulated targets
2 FOXO1 25302145 ChIP-Seq T-LYMPHOCYTE Mouse* 1.757057e-30 6.588963e-28 123 upregulated targets
3 HNF1A 27111144 ChIP-Seq CD8+TCells Mouse Blood Lymphoma* 5.512847e-30 1.378212e-27 180 upregulated targets
4 RUNX1 21571218 ChIP-Seq MEGAKARYOCYTES Human* 5.790235e-27 1.085669e-24 209 upregulated targets
5 MYB 26560356 Chip-Seq TH2 Human* 1.104989e-25 1.657484e-23 116 upregulated targets
6 UTX 26944678 Chip-Seq JUKART Human* 9.332437e-25 1.166555e-22 114 upregulated targets
7 CREM 20920259 ChIP-Seq GC1-SPG Mouse* 1.180260e-23 1.264564e-21 216 upregulated targets
8 SPI1 23547873 ChIP-Seq NB4 Human* 4.473663e-22 4.194059e-20 150 upregulated targets
9 MECOM 23826213 ChIP-Seq KASUMI Mouse* 5.815919e-22 4.846600e-20 105 upregulated targets
10 SMRT 22465074 ChIP-Seq MACROPHAGES Mouse* 2.322321e-21 1.741741e-19 106 upregulated targets
11 TAL1 20566737 ChIP-Seq PRIMARY FETAL LIVER ERYTHROID Mouse* 3.979905e-21 2.713571e-19 102 upregulated targets
12 KDM2B 26808549 Chip-Seq HPB-ALL Human* 3.199896e-20 1.999935e-18 106 upregulated targets
13 MYB 26560356 Chip-Seq TH1 Human* 9.906695e-20 5.715401e-18 104 upregulated targets
14 FLI1 21571218 ChIP-Seq MEGAKARYOCYTES Human* 1.070867e-19 5.736788e-18 210 upregulated targets
15 ENL 26711339 ChIP-Seq SEM Human Blood Leukemia* 1.894027e-19 9.470134e-18 100 upregulated targets
16 GATA3 27048872 Chip-Seq THYMUS Human* 1.315219e-18 6.165089e-17 102 upregulated targets
17 NCOR 22465074 ChIP-Seq MACROPHAGES Mouse* 2.290286e-18 1.010420e-16 101 upregulated targets
18 STAT3 20064451 ChIP-Seq CD4+T Mouse* 9.908748e-18 4.128645e-16 68 upregulated targets
19 EKLF 21900194 ChIP-Seq ERYTHROCYTE Mouse* 1.301390e-18 4.828156e-16 79 downregulated targets
20 NCOR1 26117541 ChIP-Seq K562 Human* 6.399390e-17 2.526075e-15 99 upregulated targets
21 KDM2B 26808549 Chip-Seq SUP-B15 Human* 9.090108e-17 3.408790e-15 100 upregulated targets
22 CREB1 20920259 ChIP-Seq GC1-SPG Mouse* 1.182157e-16 4.221989e-15 128 upregulated targets
23 MAF 26560356 Chip-Seq TH1 Human* 2.173344e-17 5.375403e-15 100 downregulated targets
24 TCF7 22412390 ChIP-Seq EML Mouse* 7.534101e-17 1.118061e-14 99 downregulated targets
25 BRD4 27068464 Chip-Seq AML-cells Mouse* 3.946094e-16 1.345259e-14 96 upregulated targets
26 KDM2B 26808549 Chip-Seq SIL-ALL Human* 1.211862e-15 3.787068e-14 97 upregulated targets
27 ATF3 23680149 ChIP-Seq GBM1-GSC Human* 1.211862e-15 3.787068e-14 97 upregulated targets
28 RUNX1 22412390 ChIP-Seq EML Mouse* 2.199684e-15 6.599051e-14 93 upregulated targets
29 ELK3 25401928 ChIP-Seq HUVEC Human* 6.413917e-16 7.931877e-14 99 downregulated targets
30 KDM2B 26808549 Chip-Seq DND41 Human* 4.085270e-15 1.178443e-13 96 upregulated targets
31 CREB1 23762244 ChIP-Seq HIPPOCAMPUS Rat* 7.871221e-15 2.186450e-13 108 upregulated targets
32 AF4 28076791 ChIP-Seq SEM Human Blood Leukemia* 1.045512e-14 2.800478e-13 54 upregulated targets
33 GATA1 19941827 ChIP-Seq MEL86 Mouse* 2.655031e-14 6.866459e-13 89 upregulated targets
34 E2A 27217539 Chip-Seq RAMOS-Cell Line Human* 8.219155e-14 1.988505e-12 91 upregulated targets
35 RUNX2 24655370 ChIP-Seq MC3T3E1 Mouse Bone* 9.546292e-14 2.237412e-12 181 upregulated targets
36 TAL1 20887958 ChIP-Seq HPC-7 Mouse* 1.103101e-13 1.023126e-11 90 downregulated targets
37 STAT4 19710469 ChIP-ChIP TH1 Mouse* 5.504823e-13 1.251096e-11 67 upregulated targets
38 MAF 26560356 Chip-Seq TH2 Human* 1.313236e-12 2.814076e-11 89 upregulated targets
39 BRD4 28847988 ChIP-Seq BCBL1 Human Blood Lymphoma* 2.025993e-12 4.220819e-11 27 upregulated targets
40 MYB 21317192 ChIP-Seq ERMYB Mouse* 2.117441e-12 4.292111e-11 55 upregulated targets
41 VDR 24763502 ChIP-Seq THP-1 Human* 2.243678e-12 4.428312e-11 66 upregulated targets
42 SPI1 22096565 ChIP-ChIP GC-B Mouse* 6.883478e-13 4.643219e-11 67 downregulated targets
43 PPARG 20887899 ChIP-Seq 3T3-L1 Mouse* 3.043126e-12 5.852165e-11 127 upregulated targets
44 GATA1 22383799 ChIP-Seq G1ME Mouse* 4.186823e-12 7.850293e-11 87 upregulated targets
45 GATA2 19941826 ChIP-Seq K562 Human* 2.470789e-12 1.527771e-10 88 downregulated targets
46 NUCKS1 24931609 ChIP-Seq HEPATOCYTES Mouse* 1.025981e-11 1.832109e-10 45 upregulated targets
47 SPI1 22790984 ChIP-Seq ERYTHROLEUKEMIA Mouse* 3.381096e-12 1.929825e-10 86 downregulated targets
48 YY1 26981420 ChIP-Seq C2C12 Mouse Muscle* 1.313130e-11 2.290343e-10 93 upregulated targets
49 CEBPB 20176806 ChIP-Seq MACROPHAGES Mouse* 2.382947e-11 4.026335e-10 81 upregulated targets
50 KDM2B 26808549 Chip-Seq JURKAT Human* 2.415801e-11 4.026335e-10 87 upregulated targets

B. ENCODE (experimentally validated targets)¶

Rank Transcription Factor P-value FDR Target
1 POLR2AphosphoS5 G1E-ER4 mm9* 2.716623e-24 2.214047e-21 126 downregulated targets
2 TAF7 K562 hg19* 5.843769e-23 1.769709e-20 74 downregulated targets
3 NELFE K562 hg19* 6.514265e-23 1.769709e-20 48 downregulated targets
4 KAT2A GM12878 hg19* 4.406447e-22 8.978135e-20 118 downregulated targets
5 KAT2A HeLa-S3 hg19* 9.615575e-22 1.567339e-19 87 downregulated targets
6 RELA GM18505 hg19* 2.590055e-19 3.518159e-17 118 downregulated targets
7 RELA GM18526 hg19* 2.790626e-17 3.249087e-15 67 downregulated targets
8 RELA GM12878 hg19* 5.992928e-17 6.105295e-15 64 downregulated targets
9 CEBPD K562 hg19* 1.352037e-16 1.224345e-14 59 downregulated targets
10 ETS1 MEL cell line mm9* 2.604922e-17 2.125617e-14 110 upregulated targets
11 IKZF1 GM12878 hg19* 1.293902e-16 5.279121e-14 112 upregulated targets
12 CEBPB GM12878 hg19* 1.124576e-15 9.165296e-14 55 downregulated targets
13 ZMIZ1 MEL cell line mm9* 1.302623e-15 9.651254e-14 87 downregulated targets
14 POLR2AphosphoS5 MEL cell line mm9* 5.023599e-15 3.411861e-13 131 downregulated targets
15 NCOR1 K562 hg19* 6.428200e-15 1.748470e-12 108 upregulated targets
16 CHD1 CH12.LX mm9* 1.655810e-14 2.702283e-12 107 upregulated targets
17 RELA GM12891 hg19* 1.655810e-14 2.702283e-12 107 upregulated targets
18 BCLAF1 K562 hg19* 7.077089e-14 4.436791e-12 68 downregulated targets
19 TAL1 MEL cell line mm9* 6.091471e-13 8.284401e-11 113 upregulated targets
20 TAF1 MCF-7 hg19* 4.976099e-12 2.896801e-10 90 downregulated targets
21 RELA GM12892 hg19* 3.417010e-12 3.485350e-10 76 upregulated targets
22 STAT3 HeLa-S3 hg19* 2.255815e-11 1.149056e-09 98 downregulated targets
23 SPI1 GM12878 hg19* 2.660357e-11 1.275406e-09 101 downregulated targets
24 GATA1 erythroblast mm9* 2.068906e-11 1.875808e-09 98 upregulated targets
25 POLR2A liver mm9* 1.073250e-10 4.373494e-09 97 downregulated targets
26 RELA GM19193 hg19* 1.073250e-10 4.373494e-09 97 downregulated targets
27 CHD1 MEL cell line mm9* 5.470044e-11 4.463556e-09 49 upregulated targets
28 ZNF274 K562 hg19* 2.918737e-10 1.132748e-08 59 downregulated targets
29 SPI1 HL-60 hg19* 3.748289e-10 1.388571e-08 69 downregulated targets
30 POLR2AphosphoS2 A549 hg19* 5.325321e-10 1.887016e-08 95 downregulated targets
31 EP300 MEL cell line mm9* 5.325321e-10 2.896975e-08 95 upregulated targets
32 UBTF MEL cell line mm9* 5.325321e-10 2.896975e-08 95 upregulated targets
33 EP300 CH12.LX mm9* 5.325321e-10 2.896975e-08 95 upregulated targets
34 STAT2 K562 hg19* 1.150197e-09 3.791522e-08 27 downregulated targets
35 GTF2B K562 hg19* 1.163044e-09 3.791522e-08 94 downregulated targets
36 TAL1 G1E-ER4 mm9* 1.418953e-09 4.447871e-08 51 downregulated targets
37 CHD1 IMR-90 hg19* 1.163044e-09 5.582609e-08 94 upregulated targets
38 BCL11A GM12878 hg19* 2.025461e-09 5.917632e-08 65 downregulated targets
39 GATA1 erythroblast hg19* 2.033051e-09 5.917632e-08 102 downregulated targets
40 GATA1 G1E-ER4 mm9* 1.544295e-09 7.000803e-08 154 upregulated targets
41 NFATC1 GM12878 hg19* 2.760804e-09 7.758810e-08 77 downregulated targets
42 RELA GM10847 hg19* 1.118391e-08 2.848402e-07 91 downregulated targets
43 IRF1 K562 hg19* 2.396682e-08 5.744987e-07 167 downregulated targets
44 TAL1 erythroblast mm9* 2.660270e-08 6.194628e-07 51 downregulated targets
45 TCF3 myocyte mm9* 1.701514e-08 6.311069e-07 90 upregulated targets
46 SPI1 GM12891 hg19* 2.764268e-08 9.807144e-07 58 upregulated targets
47 POLR2A kidney mm9* 4.726796e-08 1.041173e-06 89 downregulated targets
48 MEF2C GM12878 hg19* 6.799267e-08 1.458264e-06 18 downregulated targets
49 TAF1 GM12878 hg19* 7.684761e-08 1.605918e-06 65 downregulated targets
50 SP1 K562 hg19* 8.168863e-08 1.664406e-06 63 downregulated targets

C. ARCHS4 (coexpressed genes)¶

Rank Transcription Factor P-value FDR Target
1 ZNF467 human tf ARCHS4 coexpression* 2.269103e-84 3.439961e-81 99 downregulated targets
2 BCL6 human tf ARCHS4 coexpression* 2.856892e-80 1.082762e-77 96 downregulated targets
3 RARA human tf ARCHS4 coexpression* 2.856892e-80 1.082762e-77 96 downregulated targets
4 DHX34 human tf ARCHS4 coexpression* 2.856892e-80 1.082762e-77 96 downregulated targets
5 SPI1 human tf ARCHS4 coexpression* 6.414466e-79 1.620722e-76 95 downregulated targets
6 SNAI3 human tf ARCHS4 coexpression* 6.414466e-79 1.620722e-76 95 downregulated targets
7 IRF9 human tf ARCHS4 coexpression* 6.520025e-75 1.235545e-72 92 downregulated targets
8 TIGD3 human tf ARCHS4 coexpression* 6.520025e-75 1.235545e-72 92 downregulated targets
9 TFEB human tf ARCHS4 coexpression* 1.362245e-73 2.294626e-71 91 downregulated targets
10 LYL1 human tf ARCHS4 coexpression* 1.113436e-69 1.406641e-67 88 downregulated targets
11 IRF2 human tf ARCHS4 coexpression* 1.113436e-69 1.406641e-67 88 downregulated targets
12 RNF166 human tf ARCHS4 coexpression* 1.113436e-69 1.406641e-67 88 downregulated targets
13 USF1 human tf ARCHS4 coexpression* 2.161708e-68 2.184766e-66 87 downregulated targets
14 STAT5B human tf ARCHS4 coexpression* 2.161708e-68 2.184766e-66 87 downregulated targets
15 ZNF319 human tf ARCHS4 coexpression* 2.161708e-68 2.184766e-66 87 downregulated targets
16 NFE2 human tf ARCHS4 coexpression* 4.119596e-67 3.469615e-65 86 downregulated targets
17 ZBTB7B human tf ARCHS4 coexpression* 4.119596e-67 3.469615e-65 86 downregulated targets
18 STAT6 human tf ARCHS4 coexpression* 4.119596e-67 3.469615e-65 86 downregulated targets
19 RXRA human tf ARCHS4 coexpression* 1.414385e-64 1.072104e-62 84 downregulated targets
20 ZBP1 human tf ARCHS4 coexpression* 1.414385e-64 1.072104e-62 84 downregulated targets
21 IRF1 human tf ARCHS4 coexpression* 2.547587e-63 1.609226e-61 83 downregulated targets
22 IRF7 human tf ARCHS4 coexpression* 2.547587e-63 1.609226e-61 83 downregulated targets
23 ZBTB48 human tf ARCHS4 coexpression* 2.547587e-63 1.609226e-61 83 downregulated targets
24 SEMA4A human tf ARCHS4 coexpression* 2.547587e-63 1.609226e-61 83 downregulated targets
25 ZNF524 human tf ARCHS4 coexpression* 4.502238e-62 2.625151e-60 82 downregulated targets
26 TRAFD1 human tf ARCHS4 coexpression* 4.502238e-62 2.625151e-60 82 downregulated targets
27 AKNA human tf ARCHS4 coexpression* 1.327508e-59 7.453710e-58 80 downregulated targets
28 MXD1 human tf ARCHS4 coexpression* 2.214316e-58 1.118968e-56 79 downregulated targets
29 ELF4 human tf ARCHS4 coexpression* 2.214316e-58 1.118968e-56 79 downregulated targets
30 ZNF710 human tf ARCHS4 coexpression* 2.214316e-58 1.118968e-56 79 downregulated targets
31 ZNF746 human tf ARCHS4 coexpression* 3.622156e-57 1.771351e-55 78 downregulated targets
32 TSC22D4 human tf ARCHS4 coexpression* 9.136189e-55 4.328270e-53 76 downregulated targets
33 NR1H2 human tf ARCHS4 coexpression* 1.408363e-53 6.469933e-52 75 downregulated targets
34 KLF2 human tf ARCHS4 coexpression* 3.150631e-51 1.404811e-49 73 downregulated targets
35 FOXO4 human tf ARCHS4 coexpression* 4.570861e-50 1.924840e-48 72 downregulated targets
36 MXD3 human tf ARCHS4 coexpression* 4.570861e-50 1.924840e-48 72 downregulated targets
37 MBNL1 human tf ARCHS4 coexpression* 3.150631e-51 4.861424e-48 73 upregulated targets
38 FLI1 human tf ARCHS4 coexpression* 6.496490e-49 2.661805e-47 71 downregulated targets
39 ASCL2 human tf ARCHS4 coexpression* 9.044150e-48 3.427733e-46 70 downregulated targets
40 PARP12 human tf ARCHS4 coexpression* 9.044150e-48 3.427733e-46 70 downregulated targets
41 SSH2 human tf ARCHS4 coexpression* 9.044150e-48 3.427733e-46 70 downregulated targets
42 NFYC human tf ARCHS4 coexpression* 1.233077e-46 4.559379e-45 69 downregulated targets
43 PLXNC1 human tf ARCHS4 coexpression* 1.646150e-45 5.941819e-44 68 downregulated targets
44 BATF2 human tf ARCHS4 coexpression* 2.151422e-44 7.247901e-43 67 downregulated targets
45 SP110 human tf ARCHS4 coexpression* 2.151422e-44 7.247901e-43 67 downregulated targets
46 ZNF276 human tf ARCHS4 coexpression* 2.151422e-44 7.247901e-43 67 downregulated targets
47 MKRN1 human tf ARCHS4 coexpression* 2.151422e-44 1.659822e-41 67 upregulated targets
48 ATXN7 human tf ARCHS4 coexpression* 2.752176e-43 1.415536e-40 66 upregulated targets
49 RELB human tf ARCHS4 coexpression* 4.220046e-41 1.361189e-39 64 downregulated targets
50 SP2 human tf ARCHS4 coexpression* 4.220046e-41 1.361189e-39 64 downregulated targets
Download

** Table 5 | Transcription Factor Enrichment Analysis Results. **The figure contains scrollable tables displaying the results of the Transcription Factor (TF) enrichment analysis generated using Enrichr. Every row represents a TF; significant TFs are highlighted in bold. A and B display results generated using ChEA and ENCODE libraries, indicating TFs whose experimentally validated targets are enriched. C displays results generated using the ARCHS4 library, indicating TFs whose top coexpressed genes (according to the ARCHS4 dataset) are enriched.


12. Kinase Enrichment Analysis¶

Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups. Databases such as KEA contain a large number of associations between kinases and their substrates. This information can be leveraged by Enrichr to identify the protein kinases whose substrates are over-represented in the up-regulated and down-regulated genes identified by comparing two groups of samples.

In [15]:
# Initialize results
results['kinase_enrichment'] = {}

# Loop through results
for label, enrichr_results in results['enrichr'].items():

    # Run analysis
    results['kinase_enrichment'][label] = analyze(enrichr_results=enrichr_results['results'], tool='kinase_enrichment', signature_label=label)

    # Display results
    plot(results['kinase_enrichment'][label])

A. KEA (experimentally validated targets)¶

Rank Kinase P-value FDR Substrate
1 AKT1* 3.216180e-12 6.496684e-10 27 upregulated substrates
2 CDK2* 5.971464e-10 6.031178e-08 41 upregulated substrates
3 IGF1R* 2.716398e-08 1.829042e-06 14 upregulated substrates
4 MAPK1* 2.204606e-07 9.068485e-06 26 upregulated substrates
5 MAPK14* 2.709145e-07 9.068485e-06 29 upregulated substrates
6 PRKCA* 2.715555e-07 9.068485e-06 31 upregulated substrates
7 MAP3K1* 3.502413e-07 9.068485e-06 6 upregulated substrates
8 PRKCB* 3.591479e-07 9.068485e-06 22 upregulated substrates
9 SRC* 6.571651e-07 1.474971e-05 22 upregulated substrates
10 EGFR* 1.629895e-06 3.292388e-05 11 upregulated substrates
11 MAPK3* 3.469925e-06 6.130564e-05 18 upregulated substrates
12 RPS6KA3* 3.641919e-06 6.130564e-05 24 upregulated substrates
13 SYK* 4.487676e-06 6.973157e-05 8 upregulated substrates
14 CDK1* 7.924293e-06 1.143362e-04 30 upregulated substrates
15 PAK1* 2.259711e-05 3.043077e-04 8 upregulated substrates
16 LCK* 2.774527e-05 3.502841e-04 11 upregulated substrates
17 LYN* 3.720993e-05 4.421416e-04 10 upregulated substrates
18 PRKCD* 5.866212e-05 6.583193e-04 11 upregulated substrates
19 GSK3B* 6.762070e-05 6.877256e-04 29 upregulated substrates
20 CSNK2A2* 6.809165e-05 6.877256e-04 17 upregulated substrates
21 MAPK9* 1.121930e-04 1.079190e-03 12 upregulated substrates
22 PRKACA* 1.556999e-04 1.429608e-03 23 upregulated substrates
23 ROCK1* 2.076179e-04 1.823426e-03 6 upregulated substrates
24 SGK3* 2.306062e-04 1.940936e-03 4 upregulated substrates
25 SGK1* 2.860809e-04 2.311534e-03 8 upregulated substrates
26 MAPK10* 3.010921e-04 2.339254e-03 10 upregulated substrates
27 CAMK2A* 4.168418e-04 3.118594e-03 9 upregulated substrates
28 PRKACG* 5.001862e-04 3.529580e-03 13 upregulated substrates
29 HCK* 5.067219e-04 3.529580e-03 6 upregulated substrates
30 INSR* 5.428695e-04 3.655321e-03 10 upregulated substrates
31 FYN* 6.235543e-04 4.063160e-03 10 upregulated substrates
32 RPS6KA1* 3.272161e-05 4.122922e-03 9 downregulated substrates
33 CDK14* 6.586975e-04 4.158028e-03 6 upregulated substrates
34 ERBB2* 7.089724e-04 4.339770e-03 4 upregulated substrates
35 PIM1* 8.936428e-04 5.309290e-03 4 upregulated substrates
36 MAPKAPK2* 1.066534e-03 6.155425e-03 6 upregulated substrates
37 MAP3K8* 1.109747e-03 6.226913e-03 4 upregulated substrates
38 CSNK1A1* 1.192183e-03 6.508672e-03 8 upregulated substrates
39 CSNK1E* 1.424385e-03 7.571732e-03 10 upregulated substrates
40 JAK2* 1.463060e-03 7.577899e-03 5 upregulated substrates
41 STK4* 1.634778e-03 8.150391e-03 3 upregulated substrates
42 MAPK8* 1.654287e-03 8.150391e-03 14 upregulated substrates
43 PRKDC* 1.961467e-03 9.433723e-03 12 upregulated substrates
44 MAP3K5* 2.206288e-03 1.036442e-02 3 upregulated substrates
45 CSNK2A1* 2.382484e-03 1.093777e-02 16 upregulated substrates
46 ABL1* 2.452468e-03 1.100886e-02 8 upregulated substrates
47 ROCK2* 2.756202e-03 1.210332e-02 4 upregulated substrates
48 MTOR* 3.350695e-03 1.440086e-02 7 upregulated substrates
49 KSR2* 3.619519e-03 1.462286e-02 2 upregulated substrates
50 MAP3K2* 3.619519e-03 1.462286e-02 2 upregulated substrates

B. ARCHS4 (coexpressed genes)¶

Rank Kinase P-value FDR Substrate
1 STK40 human kinase ARCHS4 coexpression* 2.269103e-84 1.050595e-81 99 downregulated substrates
2 LIMK2 human kinase ARCHS4 coexpression* 6.414466e-79 1.484949e-76 95 downregulated substrates
3 PRKD2 human kinase ARCHS4 coexpression* 3.064599e-76 3.547273e-74 93 downregulated substrates
4 HCK human kinase ARCHS4 coexpression* 3.064599e-76 3.547273e-74 93 downregulated substrates
5 PRKCD human kinase ARCHS4 coexpression* 1.362245e-73 1.051199e-71 91 downregulated substrates
6 NUAK2 human kinase ARCHS4 coexpression* 1.362245e-73 1.051199e-71 91 downregulated substrates
7 MAP3K3 human kinase ARCHS4 coexpression* 2.794815e-72 1.617499e-70 90 downregulated substrates
8 FGR human kinase ARCHS4 coexpression* 2.794815e-72 1.617499e-70 90 downregulated substrates
9 RPS6KA1 human kinase ARCHS4 coexpression* 1.113436e-69 5.155210e-68 88 downregulated substrates
10 RIPK3 human kinase ARCHS4 coexpression* 1.113436e-69 5.155210e-68 88 downregulated substrates
11 GRK6 human kinase ARCHS4 coexpression* 2.161708e-68 8.340588e-67 87 downregulated substrates
12 MAP3K11 human kinase ARCHS4 coexpression* 2.161708e-68 8.340588e-67 87 downregulated substrates
13 FES human kinase ARCHS4 coexpression* 4.119596e-67 1.362409e-65 86 downregulated substrates
14 MLKL human kinase ARCHS4 coexpression* 4.119596e-67 1.362409e-65 86 downregulated substrates
15 RAF1 human kinase ARCHS4 coexpression* 1.414385e-64 4.092876e-63 84 downregulated substrates
16 LYN human kinase ARCHS4 coexpression* 1.414385e-64 4.092876e-63 84 downregulated substrates
17 MAPK14 human kinase ARCHS4 coexpression* 2.547587e-63 6.938428e-62 83 downregulated substrates
18 TYK2 human kinase ARCHS4 coexpression* 1.327508e-59 3.414645e-58 80 downregulated substrates
19 GRK2 human kinase ARCHS4 coexpression* 2.214316e-58 5.126142e-57 79 downregulated substrates
20 CSK human kinase ARCHS4 coexpression* 2.214316e-58 5.126142e-57 79 downregulated substrates
21 MAP2K3 human kinase ARCHS4 coexpression* 5.809813e-56 1.280926e-54 77 downregulated substrates
22 STK10 human kinase ARCHS4 coexpression* 9.136189e-55 1.839155e-53 76 downregulated substrates
23 JAK3 human kinase ARCHS4 coexpression* 9.136189e-55 1.839155e-53 76 downregulated substrates
24 STRADB human kinase ARCHS4 coexpression* 1.408363e-53 2.716967e-52 75 downregulated substrates
25 MAPKAPK3 human kinase ARCHS4 coexpression* 3.150631e-51 5.834969e-50 73 downregulated substrates
26 PTK2B human kinase ARCHS4 coexpression* 4.570861e-50 8.139649e-49 72 downregulated substrates
27 MAP3K1 human kinase ARCHS4 coexpression* 4.570861e-50 2.125450e-47 72 upregulated substrates
28 MKNK1 human kinase ARCHS4 coexpression* 9.044150e-48 1.550904e-46 70 downregulated substrates
29 STK38 human kinase ARCHS4 coexpression* 1.233077e-46 1.968672e-45 69 downregulated substrates
30 IRAK3 human kinase ARCHS4 coexpression* 1.233077e-46 1.968672e-45 69 downregulated substrates
31 PIM1 human kinase ARCHS4 coexpression* 1.646150e-45 2.458605e-44 68 downregulated substrates
32 ARAF human kinase ARCHS4 coexpression* 1.646150e-45 2.458605e-44 68 downregulated substrates
33 IKBKE human kinase ARCHS4 coexpression* 3.445372e-42 4.985023e-41 65 downregulated substrates
34 ATM human kinase ARCHS4 coexpression* 2.752176e-43 6.398809e-41 66 upregulated substrates
35 JAK1 human kinase ARCHS4 coexpression* 3.445372e-42 5.340327e-40 65 upregulated substrates
36 PAK2 human kinase ARCHS4 coexpression* 4.220046e-41 4.905804e-39 64 upregulated substrates
37 MARK2 human kinase ARCHS4 coexpression* 5.056246e-40 6.885418e-39 63 downregulated substrates
38 SYK human kinase ARCHS4 coexpression* 5.056246e-40 6.885418e-39 63 downregulated substrates
39 SNRK human kinase ARCHS4 coexpression* 5.056246e-40 4.702309e-38 63 upregulated substrates
40 STK17B human kinase ARCHS4 coexpression* 5.924813e-39 3.935768e-37 62 upregulated substrates
41 MAP3K2 human kinase ARCHS4 coexpression* 5.924813e-39 3.935768e-37 62 upregulated substrates
42 STRADA human kinase ARCHS4 coexpression* 6.788262e-38 8.730459e-37 61 downregulated substrates
43 ERN1 human kinase ARCHS4 coexpression* 6.788262e-38 8.730459e-37 61 downregulated substrates
44 STK4 human kinase ARCHS4 coexpression* 6.788262e-38 3.945677e-36 61 upregulated substrates
45 MAP4K2 human kinase ARCHS4 coexpression* 7.602891e-37 9.263522e-36 60 downregulated substrates
46 LTK human kinase ARCHS4 coexpression* 7.602891e-37 9.263522e-36 60 downregulated substrates
47 RIOK3 human kinase ARCHS4 coexpression* 8.322044e-36 9.879760e-35 59 downregulated substrates
48 PRKCH human kinase ARCHS4 coexpression* 8.322044e-36 3.869751e-34 59 upregulated substrates
49 ROCK1 human kinase ARCHS4 coexpression* 8.322044e-36 3.869751e-34 59 upregulated substrates
50 MAST3 human kinase ARCHS4 coexpression* 8.900276e-35 9.811494e-34 58 downregulated substrates
Download

** Table 6 | Kinase Enrichment Analysis Results. **The figure contains browsable tables displaying the results of the Protein Kinase (PK) enrichment analysis generated using Enrichr. Every row represents a PK; significant PKs are highlighted in bold. A displays results generated using KEA, indicating PKs whose experimentally validated substrates are enriched. C displays results generated using the ARCHS4 library, indicating PKs whose top coexpressed genes (according to the ARCHS4 dataset) are enriched.


13. L1000CDS2 Query¶

L1000CDS2 is a web-based tool for querying gene expression signatures against signatures created from human cell lines treated with over 20,000 small molecules and drugs for the LINCS project. It is commonly used to identify small molecules which mimic or reverse the effects of a gene expression signature generated from a differential gene expression analysis.

In [16]:
# Initialize results
results['l1000cds2'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['l1000cds2'][label] = analyze(signature=signature, tool='l1000cds2', signature_label=label, plot_type='interactive')

    # Display results
    plot(results['l1000cds2'][label])

Control vs Perturbation signature:¶

L1000CDS2 Links:

Mimic Signature Query Results: https://maayanlab.cloud/L1000CDS2/#/result/6951a376da44360053834b37

Reverse Signature Query Results: https://maayanlab.cloud/L1000CDS2/#/result/6951a376da44360053834b39

Download

** Figure 8 | L1000CDS2 Query results. **The figure contains an interactive bar chart displaying the top small molecules identified by the L1000CDS2 query. The left panel displays the small molecules which mimic the observed gene expression signature, while the right panel displays the small molecules which reverse it. Links to the L1000CDS2 web server are additionally provided, allowing users to interactively explore the analysis results. If you are experiencing issues visualizing the plot, please visit our Troubleshooting guide.


14. L1000FWD Query¶

L1000FWD is a web-based tool for querying gene expression signatures against signatures created from human cell lines treated with over 20,000 small molecules and drugs for the LINCS project.

In [17]:
# Initialize results
results['l1000fwd'] = {}

# Loop through signatures
for label, signature in signatures.items():

    # Run analysis
    results['l1000fwd'][label] = analyze(signature=signature, tool='l1000fwd', signature_label=label)

    # Display results
    plot(results['l1000fwd'][label])

** Similar Signatures: **

Signature ID P-value FDR Z-score Combined Score
1 CPC006_RMUGS_6H:BRD-K00088062-001-02-1:40 0.000005 0.032997 -1.668304 8.788569
2 CPC017_HT29_6H:BRD-K09549677-300-01-8:10 0.000014 0.048747 -1.801399 8.767109
3 CPC006_A375_6H:BRD-K20755323-001-02-6:40 0.000020 0.056127 -1.658470 7.805216
4 CPC019_VCAP_6H:BRD-K94544211-001-01-2:10 0.000033 0.075523 -1.834169 8.210300
5 CPC006_VCAP_6H:BRD-A79768653-001-02-1:10 0.000094 0.138704 -1.672292 6.734402
6 CPC012_SKB_24H:BRD-K03644760-001-01-5:10 0.000170 0.191642 -1.725651 6.504429
7 CPC016_A375_6H:BRD-A17065207-001-06-9:10 0.000343 0.312032 -1.814992 6.289378
8 CPC012_HT29_6H:BRD-K82971429-001-01-9:10 0.000389 0.333313 -1.785462 6.087913
9 CPC006_A375_6H:BRD-A18763547-300-04-8:10 0.000495 0.378274 -1.637077 5.411425
10 CPC014_HCC515_6H:BRD-K72420232-001-01-6:10 0.000718 0.480366 -1.745217 5.486585
11 CPC019_A375_6H:BRD-K05197617-001-05-8:10 0.001039 0.529745 -1.783683 5.321065
12 HOG002_MCF7_24H:BRD-K20755323-001-02-6:10 0.001079 0.537760 -1.801748 5.345398
13 CPC006_PC3_24H:BRD-K23875128-001-04-2:10 0.001130 0.545670 -1.694631 4.994029
14 CPC003_PC3_6H:BRD-K37691127-001-02-2:10 0.001142 0.545670 -1.656374 4.873610
15 CPC006_HT29_6H:BRD-K41087962-001-01-7:0.63 0.001733 0.656401 -1.626943 4.492304
16 CPC011_A549_24H:BRD-A23359898-001-06-2:10 0.001857 0.667857 -1.708746 4.667107
17 CPC009_HT29_6H:BRD-K53732802-019-01-9:10 0.002394 0.770521 -1.684286 4.414343
18 CPC003_VCAP_6H:BRD-K36007650-300-02-3:10 0.002676 0.812540 -1.628237 4.188592
19 CPC006_HT29_24H:BRD-K76703230-001-01-3:0.31 0.002870 0.853255 -1.684035 4.280975
20 CPC016_NPC_24H:BRD-K81729199-001-01-0:10 0.002906 0.858021 -1.729485 4.387131
21 CPC019_PC3_6H:BRD-K60070073-001-02-3:10 0.003347 0.924272 -1.825995 4.520074
22 CPC005_A549_24H:BRD-A47513740-001-02-5:10 0.005587 1.000000 -1.662500 3.745287
23 CPC006_A375_6H:BRD-A43331270-001-01-6:10 0.005603 1.000000 -1.677913 3.778014
24 CPC011_HT29_6H:BRD-K37270826-001-20-1:10 0.005857 1.000000 -1.747471 3.900950
25 CPC014_VCAP_6H:BRD-K50168500-001-01-2:10 0.006720 1.000000 -1.767657 3.840506
26 CPC004_HA1E_24H:BRD-K70327191-001-01-4:10 0.006873 1.000000 -1.609854 3.481865
27 CPC007_PC3_24H:BRD-K49814456-001-09-2:10 0.007447 1.000000 -1.691662 3.599838
28 CPC009_VCAP_24H:BRD-A05565054-001-01-7:10 0.007447 1.000000 -1.698480 3.614348
29 CPC010_PC3_6H:BRD-K30697463-001-15-0:10 0.008787 1.000000 -1.683134 3.460828
30 CPC020_A375_6H:BRD-A88774919-001-02-8:10 0.009000 1.000000 -1.774839 3.630912
31 CPC006_PC3_24H:BRD-K61662457-001-02-2:20 0.009397 1.000000 -1.667230 3.379500
32 CPC019_PC3_6H:BRD-K84106030-001-01-1:10 0.009788 1.000000 -1.775378 3.567239
33 CPC001_HCC515_24H:BRD-K12906962-001-02-1:10 0.009880 1.000000 -1.633461 3.275520
34 CPC019_HT29_6H:BRD-K45253154-001-04-3:10 0.010128 1.000000 -1.787531 3.565171
35 CPC005_PC3_24H:BRD-K43405658-001-01-8:10 0.010272 1.000000 -1.638462 3.257810
36 CPC018_HEPG2_6H:BRD-K96799727-001-01-7:10 0.011449 1.000000 -1.787635 3.470222
37 CPC006_MCF7_6H:BRD-K89732114-001-05-9:10 0.011567 1.000000 -1.641471 3.179177
38 CPC016_SKB_24H:BRD-K38477985-001-01-8:10 0.015187 1.000000 -1.731162 3.148143
39 CPC012_HT29_6H:BRD-K98004941-001-01-0:10 0.015188 1.000000 -1.730055 3.146126
40 CPC002_HA1E_24H:BRD-K60640630-001-03-7:10 0.015555 1.000000 -1.586161 2.867982
41 CPC012_ASC_24H:BRD-K88556033-001-01-0:10 0.015895 1.000000 -1.711952 3.079334
42 LJP001_BT20_24H:BRD-K49810818-001-02-8:10 0.015902 1.000000 -1.760592 3.166501
43 CPC015_MCF7_24H:BRD-K42635745-001-02-4:10 0.016629 1.000000 -1.767551 3.144702
44 CPC008_HT29_6H:BRD-K86992982-001-04-4:10 0.017780 1.000000 -1.661370 2.907526
45 CPC014_VCAP_24H:BRD-K81528515-001-03-9:10 0.018591 1.000000 -1.790000 3.097938
46 CPC014_A549_6H:BRD-K33551950-001-01-6:10 0.019447 1.000000 -1.781109 3.047753
47 CPC011_VCAP_6H:BRD-K60770992-066-21-9:10 0.023998 1.000000 -1.684888 2.729238
48 CPC008_VCAP_6H:BRD-K41185612-001-01-3:10 0.029783 1.000000 -1.679521 2.562994
49 CPC015_HEPG2_6H:BRD-K37991163-003-06-8:10 0.031160 1.000000 -1.701976 2.563856
50 CPC004_VCAP_24H:BRD-A28746609-001-05-7:10 0.031772 1.000000 -1.586507 2.376526
Download

** Opposite Signatures: **

Signature ID P-value FDR Z-score Combined Score
1 CPC013_SKB_24H:BRD-K41925105-001-01-6:10 9.377789e-10 0.000028 1.856747 -16.762527
2 CPC014_HT29_6H:BRD-A80960055-001-01-7:10 6.296264e-07 0.005391 1.744277 -10.816115
3 CPC005_A375_6H:BRD-A18419789-001-01-4:10 6.851862e-06 0.033413 1.860728 -9.609156
4 PCLB002_HEPG2_24H:BRD-K02130563:0.37 7.024576e-06 0.033413 1.615133 -8.323392
5 CPC006_T3M10_6H:BRD-K06792661-001-01-9:10 1.242533e-05 0.048747 1.869785 -9.172591
6 CPC019_HT29_6H:BRD-K98426715-001-01-9:10 1.803479e-05 0.055147 1.771416 -8.403399
7 CPC014_HT29_6H:BRD-K85493820-001-01-6:10 2.421599e-05 0.060980 1.736553 -8.015753
8 CPC004_HT29_6H:BRD-A25687296-300-03-5:10 3.351935e-05 0.075523 1.820056 -8.144213
9 CPC014_HA1E_6H:BRD-A80960055-001-01-7:10 4.437048e-05 0.090450 1.723755 -7.503342
10 CPC002_PC3_6H:BRD-A80502530-001-01-2:10 4.997552e-05 0.093067 1.885537 -8.110152
11 CPC004_PC3_6H:BRD-K98490050-001-01-8:10 5.000233e-05 0.093067 1.821365 -7.833711
12 CPC009_A375_6H:BRD-K03857568-001-14-0:10 6.350874e-05 0.108924 1.837353 -7.711676
13 CPC014_HT29_6H:BRD-K80622725-001-10-2:10 6.615459e-05 0.108924 1.776753 -7.425833
14 CPC007_HT29_6H:BRD-K78843060-019-02-0:10 1.113100e-04 0.153712 1.890195 -7.472822
15 CPC004_PC3_24H:BRD-A35588707-001-03-0:10 1.342052e-04 0.179537 1.849252 -7.160732
16 CPC013_HA1E_6H:BRD-K80346834-001-01-5:10 1.393792e-04 0.180809 1.755700 -6.769631
17 CPC010_PC3_6H:BRD-K28916077-001-04-0:10 1.502656e-04 0.189198 1.757476 -6.719076
18 PCLB002_HEPG2_24H:BRD-K02130563:10 1.640911e-04 0.191642 1.626646 -6.156718
19 LJP001_SKBR3_6H:BRD-K99252563-001-01-1:10 2.330728e-04 0.232038 1.651500 -5.999089
20 CPC006_SW620_6H:BRD-K06792661-001-01-9:10 3.087887e-04 0.293754 1.787439 -6.274516
21 CPC014_A549_24H:BRD-A26002865-001-01-5:10 3.542609e-04 0.315949 1.715859 -5.920876
22 CPC002_PC3_24H:BRD-K08547377-003-03-2:10 3.804512e-04 0.332382 1.871616 -6.400367
23 CPC004_A375_6H:BRD-K98490050-001-01-8:10 4.077091e-04 0.335647 1.856805 -6.293917
24 CPC019_A375_6H:BRD-K98824517-001-06-4:10 4.853755e-04 0.377790 1.705292 -5.651205
25 CPC015_A549_24H:BRD-K92093830-003-05-0:10 5.112866e-04 0.383994 1.705879 -5.614621
26 CPC012_MCF7_24H:BRD-K74761218-001-03-5:10 5.714976e-04 0.414665 1.757586 -5.699826
27 CPC016_NPC_24H:BRD-A19037878:10 6.732211e-04 0.464837 1.712722 -5.432482
28 CPC016_MCF7_6H:BRD-K91370081-001-10-3:10 7.050422e-04 0.479082 1.729834 -5.452064
29 CPC010_PC3_24H:BRD-A24643465-001-05-3:10 7.415868e-04 0.487843 1.754433 -5.491091
30 CPC011_VCAP_24H:BRD-A11990600-001-02-6:10 7.977059e-04 0.487843 1.733547 -5.370800
31 CPC006_U937_6H:BRD-K78126613-001-16-0:10 8.967437e-04 0.518766 1.775656 -5.411014
32 CPC010_A375_6H:BRD-K93034159-001-20-9:10 9.450331e-04 0.529745 1.763403 -5.333505
33 CPC014_MCF7_6H:BRD-A26002865-001-01-5:10 9.544044e-04 0.529745 1.726930 -5.215792
34 CPC016_A375_6H:BRD-K08547377-003-03-2:10 1.025448e-03 0.529745 1.737293 -5.192920
35 CPC006_TYKNU_6H:BRD-A36630025-001-02-6:0.35 1.332391e-03 0.582024 1.779220 -5.115912
36 CPC017_A549_6H:BRD-K06426971-001-01-9:10 1.543943e-03 0.650376 1.662098 -4.672770
37 CPC007_HT29_6H:BRD-K03067624-003-19-3:10 1.549637e-03 0.650376 1.796025 -5.046418
38 CPC002_PC3_6H:BRD-A69960130-066-01-4:10 1.596569e-03 0.651934 1.838885 -5.143015
39 CPC002_PC3_24H:BRD-K34608650-001-01-6:10 1.644721e-03 0.651934 1.820951 -5.069359
40 CPC006_A375_6H:BRD-K86682249-001-05-7:10 1.694118e-03 0.654016 1.812360 -5.022152
41 CPC006_SW620_6H:BRD-K04853698-003-01-4:10 1.694118e-03 0.654016 1.761222 -4.880446
42 CVD001_HEPG2_6H:BRD-K06426971-001-01-9:10 1.744786e-03 0.656401 1.672288 -4.612600
43 CPC006_HA1E_24H:BRD-A02481876-001-09-9:60 1.834833e-03 0.667857 1.772045 -4.849031
44 CPC006_RMUGS_6H:BRD-K15409150-001-02-5:30 1.941735e-03 0.686973 1.737236 -4.711054
45 CPC014_A375_6H:BRD-K33551950-001-01-6:10 2.827782e-03 0.852497 1.706767 -4.349787
46 LJP001_SKBR3_6H:BRD-K64890080-001-09-6:10 2.870162e-03 0.853255 1.612652 -4.099511
47 CPC020_PC3_6H:BRD-K58306044-001-01-3:10 3.035519e-03 0.878024 1.660358 -4.180394
48 CPC006_PC3_6H:BRD-K43620258-001-01-6:80 3.298653e-03 0.924272 1.799578 -4.465947
49 CPC014_A549_6H:BRD-K81142122-001-15-8:10 3.484586e-03 0.956228 1.685972 -4.143864
50 CPC010_MCF7_6H:BRD-A24643465-001-05-3:10 9.083642e-03 1.000000 1.694358 -3.459439
Download
Full results available at: https://maayanlab.cloud/l1000fwd/vanilla/result/6951a38dcae467002d25f942.

Methods¶

Data¶

Data Source¶

Dataset was user-submitted, compressed in an HDF5 data package, and uploaded to Google Cloud.

Data Normalization¶
Quantile Normalization¶

Raw counts were normalized using quantile normalization from the DESeq2 R package.

Signature Generation¶

The gene expression signature was generated by comparing gene expression levels between the control group and the experimental group using the limma R package (Ritchie et al., Nucleic Acids Research 2015), available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/limma.html.

PCA¶

Principal Component Analysis was performed using the PCA function from in the sklearn Python module. Prior to performing PCA, the raw gene counts were normalized using the quantile method, filtered by selecting the 500 genes with most variable expression, and finally transformed using the Z-score method.

Clustergrammer¶

The interactive heatmap was generated using Clustergrammer (Fernandez et al., 2017) which is freely available at http://amp.pharm.mssm.edu/clustergrammer/. Prior to displaying the heatmap, the raw gene counts were normalized using the quantile method, filtered by selecting the 500 genes with most variable expression, and finally transformed using the Z-score method.

Library Size Analysis¶

Read counts were calculated by performing the sum for each column in the raw gene count matrix. Total counts were subsequently divided by 106 and displayed as million reads.

Differential Expression Table¶

The gene expression signature was generated by performing differential gene expression analysis using the methods described in the Differential Gene Expression section.

Volcano Plot¶

Gene fold changes were transformed using log2 and displayed on the x axis; P-values were corrected using the Benjamini-Hochberg method, transformed using –log10, and displayed on the y axis. See the Differential Gene Expression section for more information on the methods used to generate these values.

MA Plot¶

Average gene expression was identified by calculating the mean of the normalized gene expression values and displayed on the x axis; P-values were corrected using the Benjamini-Hochberg method, transformed using –log10, and displayed on the y axis. For more information on the methods used to generate the signature, see the Differential Gene Expression section.

Enrichr Links¶

The up-regulated and down-regulated gene sets were generated by extracting the 500 genes with the respectively highest and lowest values from the gene expression signature. The gene sets were subsequently submitted to Enrichr (Kuleshov et al., 2016), which is freely available at http://amp.pharm.mssm.edu/Enrichr/, using the gene set upload API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.

Gene Ontology Enrichment Analysis¶

Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: GO_Biological_Process_2018, GO_Molecular_Function_2018, GO_Cellular_Component_2018. Significant terms are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.

Pathway Enrichment Analysis¶

Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: KEGG_2016, Reactome_2016, WikiPathways_2016. Significant terms are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.

Transcription Factor Enrichment Analysis¶

Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: ChEA_2016, ENCODE_TF_ChIP-seq_2015, ARCHS4_TFs_Coexp. Significant results are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.

Kinase Enrichment Analysis¶

Enrichment results were generated by analyzing the up-regulated and down-regulated gene sets using Enrichr. The following libraries were used for the analysis: KEA_2015, ARCHS4_Kinases_Coexp. Significant results are determined by using a cut-off of p-value<0.1 after applying Benjamini-Hochberg correction. For more information on the methods used to perform the enrichment analysis, see the Enrichr section.

L1000CDS2 Query¶

The L1000CDS2 analysis (Duan et al., 2016) was performed by submitting the top 2000 genes in the gene expression signature to the L1000CDS2 signature search API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.

L1000FWD Query¶

The L1000FWD analysis (Wang et al., 2018) was performed by submitting the top 2000 genes in the gene expression signature to the L1000FWD signature search API. For more information on the methods used to generate the signature, see the Differential Gene Expression section.


References¶

Duan, Q., Reid, S.P., Clark, N.R., Wang, Z., Fernandez, N.F., Rouillard, A.D., Readhead, B., Tritsch, S.R., Hodos, R., Hafner, M., et al. (2016). L1000CDS2: LINCS L1000 characteristic direction signatures search engine. Npj Systems Biology and Applications 2. doi: https://doi.org/10.1038/npjsba.2016.15

Fernandez, N.F., Gundersen, G.W., Rahman, A., Grimes, M.L., Rikova, K., Hornbeck, P., and Ma'ayan, A. (2017). Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Scientific Data 4, 170151. doi: http://dx.doi.org/10.1038/sdata.2017.151

Kuleshov, M.V., Jones, M.R., Rouillard, A.D., Fernandez, N.F., Duan, Q., Wang, Z., Koplev, S., Jenkins, S.L., Jagodnik, K.M., Lachmann, A., et al. (2016). Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 44, W90ÐW97. doi: https://dx.doi.org/10.1093/nar/gkw377

Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15. doi: http://doi.org/10.1186/s13059-014-0550-8

Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2, 559Ð572. doi: https://doi.org/10.1080/14786440109462720

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47–e47. doi: https://doi.org/10.1093/nar/gkv007

Wang, Z., Lachmann, A., Keenan, A.B., and Ma’ayan, A. (2018). L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics. doi: https://doi.org/10.1093/bioinformatics/bty060


BioJupies is being developed by the Ma'ayan Lab at the Icahn School of Medicine at Mount Sinai
and is an open source project available on GitHub.